<!DOCTYPE html>
<html lang="en">
<head>
  <meta charset="UTF-8" />
  <title>tf::cudaFlow class | Taskflow QuickStart</title>
  <link rel="stylesheet" href="https://fonts.googleapis.com/css?family=Source+Sans+Pro:400,400i,600,600i%7CSource+Code+Pro:400,400i,600" />
  <link rel="stylesheet" href="m-dark+documentation.compiled.css" />
  <link rel="icon" href="favicon.ico" type="image/x-icon" />
  <meta name="viewport" content="width=device-width, initial-scale=1.0" />
  <meta name="theme-color" content="#22272e" />
</head>
<body>
<header><nav id="navigation">
  <div class="m-container">
    <div class="m-row">
      <span id="m-navbar-brand" class="m-col-t-8 m-col-m-none m-left-m">
        <a href="https://taskflow.github.io"><img src="taskflow_logo.png" alt="" />Taskflow</a> <span class="m-breadcrumb">|</span> <a href="index.html" class="m-thin">QuickStart</a>
      </span>
      <div class="m-col-t-4 m-hide-m m-text-right m-nopadr">
        <a href="#search" class="m-doc-search-icon" title="Search" onclick="return showSearch()"><svg style="height: 0.9rem;" viewBox="0 0 16 16">
          <path id="m-doc-search-icon-path" d="m6 0c-3.31 0-6 2.69-6 6 0 3.31 2.69 6 6 6 1.49 0 2.85-0.541 3.89-1.44-0.0164 0.338 0.147 0.759 0.5 1.15l3.22 3.79c0.552 0.614 1.45 0.665 2 0.115 0.55-0.55 0.499-1.45-0.115-2l-3.79-3.22c-0.392-0.353-0.812-0.515-1.15-0.5 0.895-1.05 1.44-2.41 1.44-3.89 0-3.31-2.69-6-6-6zm0 1.56a4.44 4.44 0 0 1 4.44 4.44 4.44 4.44 0 0 1-4.44 4.44 4.44 4.44 0 0 1-4.44-4.44 4.44 4.44 0 0 1 4.44-4.44z"/>
        </svg></a>
        <a id="m-navbar-show" href="#navigation" title="Show navigation"></a>
        <a id="m-navbar-hide" href="#" title="Hide navigation"></a>
      </div>
      <div id="m-navbar-collapse" class="m-col-t-12 m-show-m m-col-m-none m-right-m">
        <div class="m-row">
          <ol class="m-col-t-6 m-col-m-none">
            <li><a href="pages.html">Handbook</a></li>
            <li><a href="namespaces.html">Namespaces</a></li>
          </ol>
          <ol class="m-col-t-6 m-col-m-none" start="3">
            <li><a href="annotated.html">Classes</a></li>
            <li><a href="files.html">Files</a></li>
            <li class="m-show-m"><a href="#search" class="m-doc-search-icon" title="Search" onclick="return showSearch()"><svg style="height: 0.9rem;" viewBox="0 0 16 16">
              <use href="#m-doc-search-icon-path" />
            </svg></a></li>
          </ol>
        </div>
      </div>
    </div>
  </div>
</nav></header>
<main><article>
  <div class="m-container m-container-inflatable">
    <div class="m-row">
      <div class="m-col-l-10 m-push-l-1">
        <h1>
          <span class="m-breadcrumb"><a href="namespacetf.html">tf</a>::<wbr/></span>cudaFlow <span class="m-thin">class</span>
        </h1>
        <p>class to create a cudaFlow task dependency graph</p>
        <nav class="m-block m-default">
          <h3>Contents</h3>
          <ul>
            <li>
              Reference
              <ul>
                <li><a href="#typeless-methods">Constructors, destructors, conversion operators</a></li>
                <li><a href="#pub-methods">Public functions</a></li>
              </ul>
            </li>
          </ul>
        </nav>
<p>A cudaFlow is a high-level interface over CUDA <a href="classtf_1_1Graph.html" class="m-doc">Graph</a> to perform GPU operations using the task dependency graph model. The class provides a set of methods for creating and launch different tasks on one or multiple CUDA devices, for instance, kernel tasks, data transfer tasks, and memory operation tasks. The following example creates a cudaFlow of two kernel tasks, <code>task1</code> and <code>task2</code>, where <code>task1</code> runs before <code>task2</code>.</p><pre class="m-code"><span class="n">tf</span><span class="o">::</span><span class="n">Taskflow</span><span class="w"> </span><span class="n">taskflow</span><span class="p">;</span>
<span class="n">tf</span><span class="o">::</span><span class="n">Executor</span><span class="w"> </span><span class="n">executor</span><span class="p">;</span>

<span class="n">taskflow</span><span class="p">.</span><span class="n">emplace</span><span class="p">([</span><span class="o">&amp;</span><span class="p">](</span><span class="n">tf</span><span class="o">::</span><span class="n">cudaFlow</span><span class="o">&amp;</span><span class="w"> </span><span class="n">cf</span><span class="p">){</span>
<span class="w">  </span><span class="c1">// create two kernel tasks</span>
<span class="w">  </span><span class="n">tf</span><span class="o">::</span><span class="n">cudaTask</span><span class="w"> </span><span class="n">task1</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">cf</span><span class="p">.</span><span class="n">kernel</span><span class="p">(</span><span class="n">grid1</span><span class="p">,</span><span class="w"> </span><span class="n">block1</span><span class="p">,</span><span class="w"> </span><span class="n">shm_size1</span><span class="p">,</span><span class="w"> </span><span class="n">kernel1</span><span class="p">,</span><span class="w"> </span><span class="n">args1</span><span class="p">);</span>
<span class="w">  </span><span class="n">tf</span><span class="o">::</span><span class="n">cudaTask</span><span class="w"> </span><span class="n">task2</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">cf</span><span class="p">.</span><span class="n">kernel</span><span class="p">(</span><span class="n">grid2</span><span class="p">,</span><span class="w"> </span><span class="n">block2</span><span class="p">,</span><span class="w"> </span><span class="n">shm_size2</span><span class="p">,</span><span class="w"> </span><span class="n">kernel2</span><span class="p">,</span><span class="w"> </span><span class="n">args2</span><span class="p">);</span>

<span class="w">  </span><span class="c1">// kernel1 runs before kernel2</span>
<span class="w">  </span><span class="n">task1</span><span class="p">.</span><span class="n">precede</span><span class="p">(</span><span class="n">task2</span><span class="p">);</span>
<span class="p">});</span>

<span class="n">executor</span><span class="p">.</span><span class="n">run</span><span class="p">(</span><span class="n">taskflow</span><span class="p">).</span><span class="n">wait</span><span class="p">();</span></pre><p>A cudaFlow is a task (<a href="classtf_1_1Task.html" class="m-doc">tf::<wbr />Task</a>) created from <a href="classtf_1_1Taskflow.html" class="m-doc">tf::<wbr />Taskflow</a> and will be run by <em>one</em> worker thread in the executor. That is, the callable that describes a cudaFlow will be executed sequentially. Inside a cudaFlow task, different GPU tasks (<a href="classtf_1_1cudaTask.html" class="m-doc">tf::<wbr />cudaTask</a>) may run in parallel scheduled by the CUDA runtime.</p><p>Please refer to <a href="GPUTaskingcudaFlow.html" class="m-doc">GPU Tasking (cudaFlow)</a> for details.</p>
        <section id="typeless-methods">
          <h2><a href="#typeless-methods">Constructors, destructors, conversion operators</a></h2>
          <dl class="m-doc">
            <dt id="ad4c3e001db151486c8479151a2108d37">
              <span class="m-doc-wrap-bumper"><a href="#ad4c3e001db151486c8479151a2108d37" class="m-doc-self">cudaFlow</a>(</span><span class="m-doc-wrap">)</span>
            </dt>
            <dd>constructs a cudaFlow</dd>
            <dt id="a828c3ab275521672e4ec6c78d3a9ee62">
              <span class="m-doc-wrap-bumper"><a href="#a828c3ab275521672e4ec6c78d3a9ee62" class="m-doc-self">~cudaFlow</a>(</span><span class="m-doc-wrap">) <span class="m-label m-flat m-info">defaulted</span></span>
            </dt>
            <dd>destroys the cudaFlow and its associated native CUDA graph and executable graph</dd>
            <dt id="a677a4b510abee2ac665193389b20f725">
              <span class="m-doc-wrap-bumper"><a href="#a677a4b510abee2ac665193389b20f725" class="m-doc-self">cudaFlow</a>(</span><span class="m-doc-wrap"><a href="classtf_1_1cudaFlow.html" class="m-doc">cudaFlow</a>&amp;&amp;) <span class="m-label m-flat m-info">defaulted</span></span>
            </dt>
            <dd>default move constructor</dd>
          </dl>
        </section>
        <section id="pub-methods">
          <h2><a href="#pub-methods">Public functions</a></h2>
          <dl class="m-doc">
            <dt id="a81b65ab7cb9ec851f9435cc08252e678">
              <span class="m-doc-wrap-bumper">auto <a href="#a81b65ab7cb9ec851f9435cc08252e678" class="m-doc-self">operator=</a>(</span><span class="m-doc-wrap"><a href="classtf_1_1cudaFlow.html" class="m-doc">cudaFlow</a>&amp;&amp;) -&gt; <a href="classtf_1_1cudaFlow.html" class="m-doc">cudaFlow</a>&amp; <span class="m-label m-flat m-info">defaulted</span></span>
            </dt>
            <dd>default move assignment operator</dd>
            <dt id="a1926f45a038d8faa9c1b1ee43fd29a93">
              <span class="m-doc-wrap-bumper">auto <a href="#a1926f45a038d8faa9c1b1ee43fd29a93" class="m-doc-self">empty</a>(</span><span class="m-doc-wrap">) const -&gt; bool</span>
            </dt>
            <dd>queries the emptiness of the graph</dd>
            <dt id="ae6560c27d249af7e4b8b921388f5e1e2">
              <span class="m-doc-wrap-bumper">auto <a href="#ae6560c27d249af7e4b8b921388f5e1e2" class="m-doc-self">num_tasks</a>(</span><span class="m-doc-wrap">) const -&gt; size_t</span>
            </dt>
            <dd>queries the number of tasks</dd>
            <dt id="aad726dfe21e9719d96c65530a56d9951">
              <span class="m-doc-wrap-bumper">void <a href="#aad726dfe21e9719d96c65530a56d9951" class="m-doc-self">clear</a>(</span><span class="m-doc-wrap">)</span>
            </dt>
            <dd>clears the cudaFlow object</dd>
            <dt id="a7f97b68fa7c889db49b26aa71a46a7cf">
              <span class="m-doc-wrap-bumper">void <a href="#a7f97b68fa7c889db49b26aa71a46a7cf" class="m-doc-self">dump</a>(</span><span class="m-doc-wrap"><a href="http://en.cppreference.com/w/cpp/io/basic_ostream.html" class="m-doc-external">std::<wbr />ostream</a>&amp; os) const</span>
            </dt>
            <dd>dumps the cudaFlow graph into a DOT format through an output stream</dd>
            <dt>
              <span class="m-doc-wrap-bumper">void <a href="#a43507f21eb9cb77667ffe0ac7e6ae635" class="m-doc">dump_native_graph</a>(</span><span class="m-doc-wrap"><a href="http://en.cppreference.com/w/cpp/io/basic_ostream.html" class="m-doc-external">std::<wbr />ostream</a>&amp; os) const</span>
            </dt>
            <dd>dumps the native CUDA graph into a DOT format through an output stream</dd>
            <dt>
              <span class="m-doc-wrap-bumper">auto <a href="#a30b2e107cb2c90a37f467b28d1b42a74" class="m-doc">noop</a>(</span><span class="m-doc-wrap">) -&gt; <a href="classtf_1_1cudaTask.html" class="m-doc">cudaTask</a></span>
            </dt>
            <dd>creates a no-operation task</dd>
            <dt>
              <div class="m-doc-template">template&lt;typename C&gt;</div>
              <span class="m-doc-wrap-bumper">auto <a href="#a060e1c96111c2134ce0f896420a42cd0" class="m-doc">host</a>(</span><span class="m-doc-wrap">C&amp;&amp; callable) -&gt; <a href="classtf_1_1cudaTask.html" class="m-doc">cudaTask</a></span>
            </dt>
            <dd>creates a host task that runs a callable on the host</dd>
            <dt>
              <div class="m-doc-template">template&lt;typename C&gt;</div>
              <span class="m-doc-wrap-bumper">void <a href="#a02e4e5cf7d03b9d087d6fbf54eb86bbf" class="m-doc">host</a>(</span><span class="m-doc-wrap"><a href="classtf_1_1cudaTask.html" class="m-doc">cudaTask</a> task,
              C&amp;&amp; callable)</span>
            </dt>
            <dd>updates parameters of a host task</dd>
            <dt>
              <div class="m-doc-template">template&lt;typename F, typename... ArgsT&gt;</div>
              <span class="m-doc-wrap-bumper">auto <a href="#a68f666503d13a7b80fb7399fb2f0c153" class="m-doc">kernel</a>(</span><span class="m-doc-wrap">dim3 g,
              dim3 b,
              size_t s,
              F f,
              ArgsT... args) -&gt; <a href="classtf_1_1cudaTask.html" class="m-doc">cudaTask</a></span>
            </dt>
            <dd>creates a kernel task</dd>
            <dt>
              <div class="m-doc-template">template&lt;typename F, typename... ArgsT&gt;</div>
              <span class="m-doc-wrap-bumper">void <a href="#a821117dd640807bb7ec114b46888dfb1" class="m-doc">kernel</a>(</span><span class="m-doc-wrap"><a href="classtf_1_1cudaTask.html" class="m-doc">cudaTask</a> task,
              dim3 g,
              dim3 b,
              size_t shm,
              F f,
              ArgsT... args)</span>
            </dt>
            <dd>updates parameters of a kernel task</dd>
            <dt>
              <span class="m-doc-wrap-bumper">auto <a href="#a079ca65da35301e5aafd45878a19e9d2" class="m-doc">memset</a>(</span><span class="m-doc-wrap">void* dst,
              int v,
              size_t count) -&gt; <a href="classtf_1_1cudaTask.html" class="m-doc">cudaTask</a></span>
            </dt>
            <dd>creates a memset task that fills untyped data with a byte value</dd>
            <dt>
              <span class="m-doc-wrap-bumper">void <a href="#a082505f0fec89f65808421cdc737fb17" class="m-doc">memset</a>(</span><span class="m-doc-wrap"><a href="classtf_1_1cudaTask.html" class="m-doc">cudaTask</a> task,
              void* dst,
              int ch,
              size_t count)</span>
            </dt>
            <dd>updates parameters of a memset task</dd>
            <dt>
              <span class="m-doc-wrap-bumper">auto <a href="#ad37637606f0643f360e9eda1f9a6e559" class="m-doc">memcpy</a>(</span><span class="m-doc-wrap">void* tgt,
              const void* src,
              size_t bytes) -&gt; <a href="classtf_1_1cudaTask.html" class="m-doc">cudaTask</a></span>
            </dt>
            <dd>creates a memcpy task that copies untyped data in bytes</dd>
            <dt>
              <span class="m-doc-wrap-bumper">void <a href="#acf9e6cfa65cbfcd1d33c88e64b487ce6" class="m-doc">memcpy</a>(</span><span class="m-doc-wrap"><a href="classtf_1_1cudaTask.html" class="m-doc">cudaTask</a> task,
              void* tgt,
              const void* src,
              size_t bytes)</span>
            </dt>
            <dd>updates parameters of a memcpy task</dd>
            <dt>
              <div class="m-doc-template">template&lt;typename T, std::enable_if_t&lt;is_pod_v&lt;T&gt; &amp;&amp; (sizeof(T)==1||sizeof(T)==2||sizeof(T)==4), void&gt;* = nullptr&gt;</div>
              <span class="m-doc-wrap-bumper">auto <a href="#a40172fac4464f6d805f75921ea3c2a3b" class="m-doc">zero</a>(</span><span class="m-doc-wrap">T* dst,
              size_t count) -&gt; <a href="classtf_1_1cudaTask.html" class="m-doc">cudaTask</a></span>
            </dt>
            <dd>creates a memset task that sets a typed memory block to zero</dd>
            <dt>
              <div class="m-doc-template">template&lt;typename T, std::enable_if_t&lt;is_pod_v&lt;T&gt; &amp;&amp; (sizeof(T)==1||sizeof(T)==2||sizeof(T)==4), void&gt;* = nullptr&gt;</div>
              <span class="m-doc-wrap-bumper">void <a href="#a78c2a73243809e3cbd1955cc1ffe6477" class="m-doc">zero</a>(</span><span class="m-doc-wrap"><a href="classtf_1_1cudaTask.html" class="m-doc">cudaTask</a> task,
              T* dst,
              size_t count)</span>
            </dt>
            <dd>updates parameters of a memset task to a zero task</dd>
            <dt>
              <div class="m-doc-template">template&lt;typename T, std::enable_if_t&lt;is_pod_v&lt;T&gt; &amp;&amp; (sizeof(T)==1||sizeof(T)==2||sizeof(T)==4), void&gt;* = nullptr&gt;</div>
              <span class="m-doc-wrap-bumper">auto <a href="#a21d4447bc834f4d3e1bb4772c850d090" class="m-doc">fill</a>(</span><span class="m-doc-wrap">T* dst,
              T value,
              size_t count) -&gt; <a href="classtf_1_1cudaTask.html" class="m-doc">cudaTask</a></span>
            </dt>
            <dd>creates a memset task that fills a typed memory block with a value</dd>
            <dt>
              <div class="m-doc-template">template&lt;typename T, std::enable_if_t&lt;is_pod_v&lt;T&gt; &amp;&amp; (sizeof(T)==1||sizeof(T)==2||sizeof(T)==4), void&gt;* = nullptr&gt;</div>
              <span class="m-doc-wrap-bumper">void <a href="#a39ed97c9142959c73d4c25c34d71bd5e" class="m-doc">fill</a>(</span><span class="m-doc-wrap"><a href="classtf_1_1cudaTask.html" class="m-doc">cudaTask</a> task,
              T* dst,
              T value,
              size_t count)</span>
            </dt>
            <dd>updates parameters of a memset task to a fill task</dd>
            <dt>
              <div class="m-doc-template">template&lt;typename T, std::enable_if_t&lt;!std::is_same_v&lt;T, void&gt;, void&gt;* = nullptr&gt;</div>
              <span class="m-doc-wrap-bumper">auto <a href="#af03e04771b655f9e629eb4c22e19b19f" class="m-doc">copy</a>(</span><span class="m-doc-wrap">T* tgt,
              const T* src,
              size_t num) -&gt; <a href="classtf_1_1cudaTask.html" class="m-doc">cudaTask</a></span>
            </dt>
            <dd>creates a memcopy task that copies typed data</dd>
            <dt>
              <div class="m-doc-template">template&lt;typename T, std::enable_if_t&lt;!std::is_same_v&lt;T, void&gt;, void&gt;* = nullptr&gt;</div>
              <span class="m-doc-wrap-bumper">void <a href="#a6cf6ec1e85172fa99c16bf0beffc0562" class="m-doc">copy</a>(</span><span class="m-doc-wrap"><a href="classtf_1_1cudaTask.html" class="m-doc">cudaTask</a> task,
              T* tgt,
              const T* src,
              size_t num)</span>
            </dt>
            <dd>updates parameters of a memcpy task to a copy task</dd>
            <dt>
              <span class="m-doc-wrap-bumper">void <a href="#ae6810f7de27e5a347331aacfce67bea1" class="m-doc">run</a>(</span><span class="m-doc-wrap">cudaStream_t stream)</span>
            </dt>
            <dd>offloads the cudaFlow onto a GPU asynchronously via a stream</dd>
            <dt id="acfbee67cff7dc7c6297c20c64f2e015c">
              <span class="m-doc-wrap-bumper">auto <a href="#acfbee67cff7dc7c6297c20c64f2e015c" class="m-doc-self">native_graph</a>(</span><span class="m-doc-wrap">) -&gt; cudaGraph_t</span>
            </dt>
            <dd>acquires a reference to the underlying CUDA graph</dd>
            <dt id="a5bfdaf621ab617ab5f0ca63466570256">
              <span class="m-doc-wrap-bumper">auto <a href="#a5bfdaf621ab617ab5f0ca63466570256" class="m-doc-self">native_executable</a>(</span><span class="m-doc-wrap">) -&gt; cudaGraphExec_t</span>
            </dt>
            <dd>acquires a reference to the underlying CUDA graph executable</dd>
            <dt>
              <div class="m-doc-template">template&lt;typename C&gt;</div>
              <span class="m-doc-wrap-bumper">auto <a href="#ac2906cb0002fc411a983d100a3d58d62" class="m-doc">single_task</a>(</span><span class="m-doc-wrap">C c) -&gt; <a href="classtf_1_1cudaTask.html" class="m-doc">cudaTask</a></span>
            </dt>
            <dd>runs a callable with only a single kernel thread</dd>
            <dt>
              <div class="m-doc-template">template&lt;typename C&gt;</div>
              <span class="m-doc-wrap-bumper">void <a href="#add2d364f38c72322d8e36bc0da0b98e4" class="m-doc">single_task</a>(</span><span class="m-doc-wrap"><a href="classtf_1_1cudaTask.html" class="m-doc">cudaTask</a> task,
              C c)</span>
            </dt>
            <dd>updates a single-threaded kernel task</dd>
            <dt>
              <div class="m-doc-template">template&lt;typename I, typename C&gt;</div>
              <span class="m-doc-wrap-bumper">auto <a href="#a1a681f6223853b6445dcfdad07e4d0fd" class="m-doc">for_each</a>(</span><span class="m-doc-wrap">I first,
              I last,
              C callable) -&gt; <a href="classtf_1_1cudaTask.html" class="m-doc">cudaTask</a></span>
            </dt>
            <dd>applies a callable to each dereferenced element of the data array</dd>
            <dt>
              <div class="m-doc-template">template&lt;typename I, typename C&gt;</div>
              <span class="m-doc-wrap-bumper">void <a href="#af9cc7ee16602754929bb9118a9d7f0b2" class="m-doc">for_each</a>(</span><span class="m-doc-wrap"><a href="classtf_1_1cudaTask.html" class="m-doc">cudaTask</a> task,
              I first,
              I last,
              C callable)</span>
            </dt>
            <dd>updates parameters of a kernel task created from <a href="classtf_1_1cudaFlow.html#a1a681f6223853b6445dcfdad07e4d0fd" class="m-doc">tf::<wbr />cudaFlow::<wbr />for_each</a></dd>
            <dt>
              <div class="m-doc-template">template&lt;typename I, typename C&gt;</div>
              <span class="m-doc-wrap-bumper">auto <a href="#a34f1ea89e5651faa6e8af522a42556ac" class="m-doc">for_each_index</a>(</span><span class="m-doc-wrap">I first,
              I last,
              I step,
              C callable) -&gt; <a href="classtf_1_1cudaTask.html" class="m-doc">cudaTask</a></span>
            </dt>
            <dd>applies a callable to each index in the range with the step size</dd>
            <dt>
              <div class="m-doc-template">template&lt;typename I, typename C&gt;</div>
              <span class="m-doc-wrap-bumper">void <a href="#a3fa7f8e38b4da1fe0cbcfb265f9349a2" class="m-doc">for_each_index</a>(</span><span class="m-doc-wrap"><a href="classtf_1_1cudaTask.html" class="m-doc">cudaTask</a> task,
              I first,
              I last,
              I step,
              C callable)</span>
            </dt>
            <dd>updates parameters of a kernel task created from <a href="classtf_1_1cudaFlow.html#a34f1ea89e5651faa6e8af522a42556ac" class="m-doc">tf::<wbr />cudaFlow::<wbr />for_each_index</a></dd>
            <dt>
              <div class="m-doc-template">template&lt;typename I, typename O, typename C&gt;</div>
              <span class="m-doc-wrap-bumper">auto <a href="#af89a9bda182272462a0eda2581536cd8" class="m-doc">transform</a>(</span><span class="m-doc-wrap">I first,
              I last,
              O output,
              C op) -&gt; <a href="classtf_1_1cudaTask.html" class="m-doc">cudaTask</a></span>
            </dt>
            <dd>applies a callable to a source range and stores the result in a target range</dd>
            <dt>
              <div class="m-doc-template">template&lt;typename I, typename O, typename C&gt;</div>
              <span class="m-doc-wrap-bumper">void <a href="#a4a211b1f8562e10f9aae8b44fd6acdec" class="m-doc">transform</a>(</span><span class="m-doc-wrap"><a href="classtf_1_1cudaTask.html" class="m-doc">cudaTask</a> task,
              I first,
              I last,
              O output,
              C c)</span>
            </dt>
            <dd>updates parameters of a kernel task created from <a href="classtf_1_1cudaFlow.html#af89a9bda182272462a0eda2581536cd8" class="m-doc">tf::<wbr />cudaFlow::<wbr />transform</a></dd>
            <dt>
              <div class="m-doc-template">template&lt;typename I1, typename I2, typename O, typename C&gt;</div>
              <span class="m-doc-wrap-bumper">auto <a href="#abab2bfdfc86ef3a764ece4743fdede76" class="m-doc">transform</a>(</span><span class="m-doc-wrap">I1 first1,
              I1 last1,
              I2 first2,
              O output,
              C op) -&gt; <a href="classtf_1_1cudaTask.html" class="m-doc">cudaTask</a></span>
            </dt>
            <dd>creates a task to perform parallel transforms over two ranges of items</dd>
            <dt>
              <div class="m-doc-template">template&lt;typename I1, typename I2, typename O, typename C&gt;</div>
              <span class="m-doc-wrap-bumper">void <a href="#a7c6ca7be2b6908e8f71570c54303ba9e" class="m-doc">transform</a>(</span><span class="m-doc-wrap"><a href="classtf_1_1cudaTask.html" class="m-doc">cudaTask</a> task,
              I1 first1,
              I1 last1,
              I2 first2,
              O output,
              C c)</span>
            </dt>
            <dd>updates parameters of a kernel task created from <a href="classtf_1_1cudaFlow.html#af89a9bda182272462a0eda2581536cd8" class="m-doc">tf::<wbr />cudaFlow::<wbr />transform</a></dd>
            <dt>
              <div class="m-doc-template">template&lt;typename C&gt;</div>
              <span class="m-doc-wrap-bumper">auto <a href="#a89c389fff64a16e5dd8c60875d3b514d" class="m-doc">capture</a>(</span><span class="m-doc-wrap">C&amp;&amp; callable) -&gt; <a href="classtf_1_1cudaTask.html" class="m-doc">cudaTask</a></span>
            </dt>
            <dd>constructs a subflow graph through <a href="classtf_1_1cudaFlowCapturer.html" class="m-doc">tf::<wbr />cudaFlowCapturer</a></dd>
            <dt>
              <div class="m-doc-template">template&lt;typename C&gt;</div>
              <span class="m-doc-wrap-bumper">void <a href="#aa0f182dc0fa99bcc9118311925fddca5" class="m-doc">capture</a>(</span><span class="m-doc-wrap"><a href="classtf_1_1cudaTask.html" class="m-doc">cudaTask</a> task,
              C callable)</span>
            </dt>
            <dd>updates the captured child graph</dd>
          </dl>
        </section>
        <section>
          <h2>Function documentation</h2>
          <section class="m-doc-details" id="a43507f21eb9cb77667ffe0ac7e6ae635"><div>
            <h3>
              <span class="m-doc-wrap-bumper">void tf::<wbr />cudaFlow::<wbr /></span><span class="m-doc-wrap"><span class="m-doc-wrap-bumper"><a href="#a43507f21eb9cb77667ffe0ac7e6ae635" class="m-doc-self">dump_native_graph</a>(</span><span class="m-doc-wrap"><a href="http://en.cppreference.com/w/cpp/io/basic_ostream.html" class="m-doc-external">std::<wbr />ostream</a>&amp; os) const</span></span>
            </h3>
            <p>dumps the native CUDA graph into a DOT format through an output stream</p>
<p>The native CUDA graph may be different from the upper-level cudaFlow graph when flow capture is involved.</p>
          </div></section>
          <section class="m-doc-details" id="a30b2e107cb2c90a37f467b28d1b42a74"><div>
            <h3>
              <span class="m-doc-wrap-bumper"><a href="classtf_1_1cudaTask.html" class="m-doc">cudaTask</a> tf::<wbr />cudaFlow::<wbr /></span><span class="m-doc-wrap"><span class="m-doc-wrap-bumper"><a href="#a30b2e107cb2c90a37f467b28d1b42a74" class="m-doc-self">noop</a>(</span><span class="m-doc-wrap">)</span></span>
            </h3>
            <p>creates a no-operation task</p>
            <table class="m-table m-fullwidth m-flat">
              <tfoot>
                <tr>
                  <th style="width: 1%">Returns</th>
                  <td>a <a href="classtf_1_1cudaTask.html" class="m-doc">tf::<wbr />cudaTask</a> handle</td>
                </tr>
              </tfoot>
            </table>
<p>An empty node performs no operation during execution, but can be used for transitive ordering. For example, a phased execution graph with 2 groups of <code>n</code> nodes with a barrier between them can be represented using an empty node and <code>2*n</code> dependency edges, rather than no empty node and <code>n^2</code> dependency edges.</p>
          </div></section>
          <section class="m-doc-details" id="a060e1c96111c2134ce0f896420a42cd0"><div>
            <h3>
              <div class="m-doc-template">
                template&lt;typename C&gt;
              </div>
              <span class="m-doc-wrap-bumper"><a href="classtf_1_1cudaTask.html" class="m-doc">cudaTask</a> tf::<wbr />cudaFlow::<wbr /></span><span class="m-doc-wrap"><span class="m-doc-wrap-bumper"><a href="#a060e1c96111c2134ce0f896420a42cd0" class="m-doc-self">host</a>(</span><span class="m-doc-wrap">C&amp;&amp; callable)</span></span>
            </h3>
            <p>creates a host task that runs a callable on the host</p>
            <table class="m-table m-fullwidth m-flat">
              <thead>
                <tr><th colspan="2">Template parameters</th></tr>
              </thead>
              <tbody>
                <tr>
                  <td style="width: 1%">C</td>
                  <td>callable type</td>
                </tr>
              </tbody>
              <thead>
                <tr><th colspan="2">Parameters</th></tr>
              </thead>
              <tbody>
                <tr>
                  <td>callable</td>
                  <td>a callable object with neither arguments nor return (i.e., constructible from <code>std::function&lt;void()&gt;</code>)</td>
                </tr>
              </tbody>
              <tfoot>
                <tr>
                  <th>Returns</th>
                  <td>a <a href="classtf_1_1cudaTask.html" class="m-doc">tf::<wbr />cudaTask</a> handle</td>
                </tr>
              </tfoot>
            </table>
<p>A host task can only execute CPU-specific functions and cannot do any CUDA calls (e.g., <code>cudaMalloc</code>).</p>
          </div></section>
          <section class="m-doc-details" id="a02e4e5cf7d03b9d087d6fbf54eb86bbf"><div>
            <h3>
              <div class="m-doc-template">
                template&lt;typename C&gt;
              </div>
              <span class="m-doc-wrap-bumper">void tf::<wbr />cudaFlow::<wbr /></span><span class="m-doc-wrap"><span class="m-doc-wrap-bumper"><a href="#a02e4e5cf7d03b9d087d6fbf54eb86bbf" class="m-doc-self">host</a>(</span><span class="m-doc-wrap"><a href="classtf_1_1cudaTask.html" class="m-doc">cudaTask</a> task,
              C&amp;&amp; callable)</span></span>
            </h3>
            <p>updates parameters of a host task</p>
<p>The method is similar to <a href="classtf_1_1cudaFlow.html#a060e1c96111c2134ce0f896420a42cd0" class="m-doc">tf::<wbr />cudaFlow::<wbr />host</a> but operates on a task of type <a href="namespacetf.html#afebc56ae6d5765010d0dd13a5f04132eab9361011891280a44d85b967739cc6a5" class="m-doc">tf::<wbr />cudaTaskType::<wbr />HOST</a>.</p>
          </div></section>
          <section class="m-doc-details" id="a68f666503d13a7b80fb7399fb2f0c153"><div>
            <h3>
              <div class="m-doc-template">
                template&lt;typename F, typename... ArgsT&gt;
              </div>
              <span class="m-doc-wrap-bumper"><a href="classtf_1_1cudaTask.html" class="m-doc">cudaTask</a> tf::<wbr />cudaFlow::<wbr /></span><span class="m-doc-wrap"><span class="m-doc-wrap-bumper"><a href="#a68f666503d13a7b80fb7399fb2f0c153" class="m-doc-self">kernel</a>(</span><span class="m-doc-wrap">dim3 g,
              dim3 b,
              size_t s,
              F f,
              ArgsT... args)</span></span>
            </h3>
            <p>creates a kernel task</p>
            <table class="m-table m-fullwidth m-flat">
              <thead>
                <tr><th colspan="2">Template parameters</th></tr>
              </thead>
              <tbody>
                <tr>
                  <td style="width: 1%">F</td>
                  <td>kernel function type</td>
                </tr>
                <tr>
                  <td>ArgsT</td>
                  <td>kernel function parameters type</td>
                </tr>
              </tbody>
              <thead>
                <tr><th colspan="2">Parameters</th></tr>
              </thead>
              <tbody>
                <tr>
                  <td>g</td>
                  <td>configured grid</td>
                </tr>
                <tr>
                  <td>b</td>
                  <td>configured block</td>
                </tr>
                <tr>
                  <td>s</td>
                  <td>configured shared memory size in bytes</td>
                </tr>
                <tr>
                  <td>f</td>
                  <td>kernel function</td>
                </tr>
                <tr>
                  <td>args</td>
                  <td>arguments to forward to the kernel function by copy</td>
                </tr>
              </tbody>
              <tfoot>
                <tr>
                  <th>Returns</th>
                  <td>a <a href="classtf_1_1cudaTask.html" class="m-doc">tf::<wbr />cudaTask</a> handle</td>
                </tr>
              </tfoot>
            </table>
          </div></section>
          <section class="m-doc-details" id="a821117dd640807bb7ec114b46888dfb1"><div>
            <h3>
              <div class="m-doc-template">
                template&lt;typename F, typename... ArgsT&gt;
              </div>
              <span class="m-doc-wrap-bumper">void tf::<wbr />cudaFlow::<wbr /></span><span class="m-doc-wrap"><span class="m-doc-wrap-bumper"><a href="#a821117dd640807bb7ec114b46888dfb1" class="m-doc-self">kernel</a>(</span><span class="m-doc-wrap"><a href="classtf_1_1cudaTask.html" class="m-doc">cudaTask</a> task,
              dim3 g,
              dim3 b,
              size_t shm,
              F f,
              ArgsT... args)</span></span>
            </h3>
            <p>updates parameters of a kernel task</p>
<p>The method is similar to <a href="classtf_1_1cudaFlow.html#a68f666503d13a7b80fb7399fb2f0c153" class="m-doc">tf::<wbr />cudaFlow::<wbr />kernel</a> but operates on a task of type <a href="namespacetf.html#afebc56ae6d5765010d0dd13a5f04132ea35c10219c45ccfb5b07444fd7e17214c" class="m-doc">tf::<wbr />cudaTaskType::<wbr />KERNEL</a>. The kernel function name must NOT change.</p>
          </div></section>
          <section class="m-doc-details" id="a079ca65da35301e5aafd45878a19e9d2"><div>
            <h3>
              <span class="m-doc-wrap-bumper"><a href="classtf_1_1cudaTask.html" class="m-doc">cudaTask</a> tf::<wbr />cudaFlow::<wbr /></span><span class="m-doc-wrap"><span class="m-doc-wrap-bumper"><a href="#a079ca65da35301e5aafd45878a19e9d2" class="m-doc-self">memset</a>(</span><span class="m-doc-wrap">void* dst,
              int v,
              size_t count)</span></span>
            </h3>
            <p>creates a memset task that fills untyped data with a byte value</p>
            <table class="m-table m-fullwidth m-flat">
              <thead>
                <tr><th colspan="2">Parameters</th></tr>
              </thead>
              <tbody>
                <tr>
                  <td style="width: 1%">dst</td>
                  <td>pointer to the destination device memory area</td>
                </tr>
                <tr>
                  <td>v</td>
                  <td>value to set for each byte of specified memory</td>
                </tr>
                <tr>
                  <td>count</td>
                  <td>size in bytes to set</td>
                </tr>
              </tbody>
              <tfoot>
                <tr>
                  <th>Returns</th>
                  <td>a <a href="classtf_1_1cudaTask.html" class="m-doc">tf::<wbr />cudaTask</a> handle</td>
                </tr>
              </tfoot>
            </table>
<p>A memset task fills the first <code>count</code> bytes of device memory area pointed by <code>dst</code> with the byte value <code>v</code>.</p>
          </div></section>
          <section class="m-doc-details" id="a082505f0fec89f65808421cdc737fb17"><div>
            <h3>
              <span class="m-doc-wrap-bumper">void tf::<wbr />cudaFlow::<wbr /></span><span class="m-doc-wrap"><span class="m-doc-wrap-bumper"><a href="#a082505f0fec89f65808421cdc737fb17" class="m-doc-self">memset</a>(</span><span class="m-doc-wrap"><a href="classtf_1_1cudaTask.html" class="m-doc">cudaTask</a> task,
              void* dst,
              int ch,
              size_t count)</span></span>
            </h3>
            <p>updates parameters of a memset task</p>
<p>The method is similar to <a href="classtf_1_1cudaFlow.html#a079ca65da35301e5aafd45878a19e9d2" class="m-doc">tf::<wbr />cudaFlow::<wbr />memset</a> but operates on a task of type <a href="namespacetf.html#afebc56ae6d5765010d0dd13a5f04132ea41d4dbfd78ceea21abb0ecb03c3cc921" class="m-doc">tf::<wbr />cudaTaskType::<wbr />MEMSET</a>. The source/destination memory may have different address values but must be allocated from the same contexts as the original source/destination memory.</p>
          </div></section>
          <section class="m-doc-details" id="ad37637606f0643f360e9eda1f9a6e559"><div>
            <h3>
              <span class="m-doc-wrap-bumper"><a href="classtf_1_1cudaTask.html" class="m-doc">cudaTask</a> tf::<wbr />cudaFlow::<wbr /></span><span class="m-doc-wrap"><span class="m-doc-wrap-bumper"><a href="#ad37637606f0643f360e9eda1f9a6e559" class="m-doc-self">memcpy</a>(</span><span class="m-doc-wrap">void* tgt,
              const void* src,
              size_t bytes)</span></span>
            </h3>
            <p>creates a memcpy task that copies untyped data in bytes</p>
            <table class="m-table m-fullwidth m-flat">
              <thead>
                <tr><th colspan="2">Parameters</th></tr>
              </thead>
              <tbody>
                <tr>
                  <td style="width: 1%">tgt</td>
                  <td>pointer to the target memory block</td>
                </tr>
                <tr>
                  <td>src</td>
                  <td>pointer to the source memory block</td>
                </tr>
                <tr>
                  <td>bytes</td>
                  <td>bytes to copy</td>
                </tr>
              </tbody>
              <tfoot>
                <tr>
                  <th>Returns</th>
                  <td>a <a href="classtf_1_1cudaTask.html" class="m-doc">tf::<wbr />cudaTask</a> handle</td>
                </tr>
              </tfoot>
            </table>
<p>A memcpy task transfers <code>bytes</code> of data from a source location to a target location. Direction can be arbitrary among CPUs and GPUs.</p>
          </div></section>
          <section class="m-doc-details" id="acf9e6cfa65cbfcd1d33c88e64b487ce6"><div>
            <h3>
              <span class="m-doc-wrap-bumper">void tf::<wbr />cudaFlow::<wbr /></span><span class="m-doc-wrap"><span class="m-doc-wrap-bumper"><a href="#acf9e6cfa65cbfcd1d33c88e64b487ce6" class="m-doc-self">memcpy</a>(</span><span class="m-doc-wrap"><a href="classtf_1_1cudaTask.html" class="m-doc">cudaTask</a> task,
              void* tgt,
              const void* src,
              size_t bytes)</span></span>
            </h3>
            <p>updates parameters of a memcpy task</p>
<p>The method is similar to <a href="classtf_1_1cudaFlow.html#ad37637606f0643f360e9eda1f9a6e559" class="m-doc">tf::<wbr />cudaFlow::<wbr />memcpy</a> but operates on a task of type <a href="namespacetf.html#afebc56ae6d5765010d0dd13a5f04132eac5d10cc70cce96265c445f14e7f5aba4" class="m-doc">tf::<wbr />cudaTaskType::<wbr />MEMCPY</a>. The source/destination memory may have different address values but must be allocated from the same contexts as the original source/destination memory.</p>
          </div></section>
          <section class="m-doc-details" id="a40172fac4464f6d805f75921ea3c2a3b"><div>
            <h3>
              <div class="m-doc-template">
                template&lt;typename T, std::enable_if_t&lt;is_pod_v&lt;T&gt; &amp;&amp; (sizeof(T)==1||sizeof(T)==2||sizeof(T)==4), void&gt;* = nullptr&gt;
              </div>
              <span class="m-doc-wrap-bumper"><a href="classtf_1_1cudaTask.html" class="m-doc">cudaTask</a> tf::<wbr />cudaFlow::<wbr /></span><span class="m-doc-wrap"><span class="m-doc-wrap-bumper"><a href="#a40172fac4464f6d805f75921ea3c2a3b" class="m-doc-self">zero</a>(</span><span class="m-doc-wrap">T* dst,
              size_t count)</span></span>
            </h3>
            <p>creates a memset task that sets a typed memory block to zero</p>
            <table class="m-table m-fullwidth m-flat">
              <thead>
                <tr><th colspan="2">Template parameters</th></tr>
              </thead>
              <tbody>
                <tr>
                  <td style="width: 1%">T</td>
                  <td>element type (size of <code>T</code> must be either 1, 2, or 4)</td>
                </tr>
              </tbody>
              <thead>
                <tr><th colspan="2">Parameters</th></tr>
              </thead>
              <tbody>
                <tr>
                  <td>dst</td>
                  <td>pointer to the destination device memory area</td>
                </tr>
                <tr>
                  <td>count</td>
                  <td>number of elements</td>
                </tr>
              </tbody>
              <tfoot>
                <tr>
                  <th>Returns</th>
                  <td>a <a href="classtf_1_1cudaTask.html" class="m-doc">tf::<wbr />cudaTask</a> handle</td>
                </tr>
              </tfoot>
            </table>
<p>A zero task zeroes the first <code>count</code> elements of type <code>T</code> in a device memory area pointed by <code>dst</code>.</p>
          </div></section>
          <section class="m-doc-details" id="a78c2a73243809e3cbd1955cc1ffe6477"><div>
            <h3>
              <div class="m-doc-template">
                template&lt;typename T, std::enable_if_t&lt;is_pod_v&lt;T&gt; &amp;&amp; (sizeof(T)==1||sizeof(T)==2||sizeof(T)==4), void&gt;* = nullptr&gt;
              </div>
              <span class="m-doc-wrap-bumper">void tf::<wbr />cudaFlow::<wbr /></span><span class="m-doc-wrap"><span class="m-doc-wrap-bumper"><a href="#a78c2a73243809e3cbd1955cc1ffe6477" class="m-doc-self">zero</a>(</span><span class="m-doc-wrap"><a href="classtf_1_1cudaTask.html" class="m-doc">cudaTask</a> task,
              T* dst,
              size_t count)</span></span>
            </h3>
            <p>updates parameters of a memset task to a zero task</p>
<p>The method is similar to <a href="classtf_1_1cudaFlow.html#a40172fac4464f6d805f75921ea3c2a3b" class="m-doc">tf::<wbr />cudaFlow::<wbr />zero</a> but operates on a task of type <a href="namespacetf.html#afebc56ae6d5765010d0dd13a5f04132ea41d4dbfd78ceea21abb0ecb03c3cc921" class="m-doc">tf::<wbr />cudaTaskType::<wbr />MEMSET</a>.</p><p>The source/destination memory may have different address values but must be allocated from the same contexts as the original source/destination memory.</p>
          </div></section>
          <section class="m-doc-details" id="a21d4447bc834f4d3e1bb4772c850d090"><div>
            <h3>
              <div class="m-doc-template">
                template&lt;typename T, std::enable_if_t&lt;is_pod_v&lt;T&gt; &amp;&amp; (sizeof(T)==1||sizeof(T)==2||sizeof(T)==4), void&gt;* = nullptr&gt;
              </div>
              <span class="m-doc-wrap-bumper"><a href="classtf_1_1cudaTask.html" class="m-doc">cudaTask</a> tf::<wbr />cudaFlow::<wbr /></span><span class="m-doc-wrap"><span class="m-doc-wrap-bumper"><a href="#a21d4447bc834f4d3e1bb4772c850d090" class="m-doc-self">fill</a>(</span><span class="m-doc-wrap">T* dst,
              T value,
              size_t count)</span></span>
            </h3>
            <p>creates a memset task that fills a typed memory block with a value</p>
            <table class="m-table m-fullwidth m-flat">
              <thead>
                <tr><th colspan="2">Template parameters</th></tr>
              </thead>
              <tbody>
                <tr>
                  <td style="width: 1%">T</td>
                  <td>element type (size of <code>T</code> must be either 1, 2, or 4)</td>
                </tr>
              </tbody>
              <thead>
                <tr><th colspan="2">Parameters</th></tr>
              </thead>
              <tbody>
                <tr>
                  <td>dst</td>
                  <td>pointer to the destination device memory area</td>
                </tr>
                <tr>
                  <td>value</td>
                  <td>value to fill for each element of type <code>T</code></td>
                </tr>
                <tr>
                  <td>count</td>
                  <td>number of elements</td>
                </tr>
              </tbody>
              <tfoot>
                <tr>
                  <th>Returns</th>
                  <td>a <a href="classtf_1_1cudaTask.html" class="m-doc">tf::<wbr />cudaTask</a> handle</td>
                </tr>
              </tfoot>
            </table>
<p>A fill task fills the first <code>count</code> elements of type <code>T</code> with <code>value</code> in a device memory area pointed by <code>dst</code>. The value to fill is interpreted in type <code>T</code> rather than byte.</p>
          </div></section>
          <section class="m-doc-details" id="a39ed97c9142959c73d4c25c34d71bd5e"><div>
            <h3>
              <div class="m-doc-template">
                template&lt;typename T, std::enable_if_t&lt;is_pod_v&lt;T&gt; &amp;&amp; (sizeof(T)==1||sizeof(T)==2||sizeof(T)==4), void&gt;* = nullptr&gt;
              </div>
              <span class="m-doc-wrap-bumper">void tf::<wbr />cudaFlow::<wbr /></span><span class="m-doc-wrap"><span class="m-doc-wrap-bumper"><a href="#a39ed97c9142959c73d4c25c34d71bd5e" class="m-doc-self">fill</a>(</span><span class="m-doc-wrap"><a href="classtf_1_1cudaTask.html" class="m-doc">cudaTask</a> task,
              T* dst,
              T value,
              size_t count)</span></span>
            </h3>
            <p>updates parameters of a memset task to a fill task</p>
<p>The method is similar to <a href="classtf_1_1cudaFlow.html#a21d4447bc834f4d3e1bb4772c850d090" class="m-doc">tf::<wbr />cudaFlow::<wbr />fill</a> but operates on a task of type <a href="namespacetf.html#afebc56ae6d5765010d0dd13a5f04132ea41d4dbfd78ceea21abb0ecb03c3cc921" class="m-doc">tf::<wbr />cudaTaskType::<wbr />MEMSET</a>.</p><p>The source/destination memory may have different address values but must be allocated from the same contexts as the original source/destination memory.</p>
          </div></section>
          <section class="m-doc-details" id="af03e04771b655f9e629eb4c22e19b19f"><div>
            <h3>
              <div class="m-doc-template">
                template&lt;typename T, std::enable_if_t&lt;!std::is_same_v&lt;T, void&gt;, void&gt;* = nullptr&gt;
              </div>
              <span class="m-doc-wrap-bumper"><a href="classtf_1_1cudaTask.html" class="m-doc">cudaTask</a> tf::<wbr />cudaFlow::<wbr /></span><span class="m-doc-wrap"><span class="m-doc-wrap-bumper"><a href="#af03e04771b655f9e629eb4c22e19b19f" class="m-doc-self">copy</a>(</span><span class="m-doc-wrap">T* tgt,
              const T* src,
              size_t num)</span></span>
            </h3>
            <p>creates a memcopy task that copies typed data</p>
            <table class="m-table m-fullwidth m-flat">
              <thead>
                <tr><th colspan="2">Template parameters</th></tr>
              </thead>
              <tbody>
                <tr>
                  <td style="width: 1%">T</td>
                  <td>element type (non-void)</td>
                </tr>
              </tbody>
              <thead>
                <tr><th colspan="2">Parameters</th></tr>
              </thead>
              <tbody>
                <tr>
                  <td>tgt</td>
                  <td>pointer to the target memory block</td>
                </tr>
                <tr>
                  <td>src</td>
                  <td>pointer to the source memory block</td>
                </tr>
                <tr>
                  <td>num</td>
                  <td>number of elements to copy</td>
                </tr>
              </tbody>
              <tfoot>
                <tr>
                  <th>Returns</th>
                  <td>a <a href="classtf_1_1cudaTask.html" class="m-doc">tf::<wbr />cudaTask</a> handle</td>
                </tr>
              </tfoot>
            </table>
<p>A copy task transfers <code>num*sizeof(T)</code> bytes of data from a source location to a target location. Direction can be arbitrary among CPUs and GPUs.</p>
          </div></section>
          <section class="m-doc-details" id="a6cf6ec1e85172fa99c16bf0beffc0562"><div>
            <h3>
              <div class="m-doc-template">
                template&lt;typename T, std::enable_if_t&lt;!std::is_same_v&lt;T, void&gt;, void&gt;* = nullptr&gt;
              </div>
              <span class="m-doc-wrap-bumper">void tf::<wbr />cudaFlow::<wbr /></span><span class="m-doc-wrap"><span class="m-doc-wrap-bumper"><a href="#a6cf6ec1e85172fa99c16bf0beffc0562" class="m-doc-self">copy</a>(</span><span class="m-doc-wrap"><a href="classtf_1_1cudaTask.html" class="m-doc">cudaTask</a> task,
              T* tgt,
              const T* src,
              size_t num)</span></span>
            </h3>
            <p>updates parameters of a memcpy task to a copy task</p>
<p>The method is similar to <a href="classtf_1_1cudaFlow.html#af03e04771b655f9e629eb4c22e19b19f" class="m-doc">tf::<wbr />cudaFlow::<wbr />copy</a> but operates on a task of type <a href="namespacetf.html#afebc56ae6d5765010d0dd13a5f04132eac5d10cc70cce96265c445f14e7f5aba4" class="m-doc">tf::<wbr />cudaTaskType::<wbr />MEMCPY</a>. The source/destination memory may have different address values but must be allocated from the same contexts as the original source/destination memory.</p>
          </div></section>
          <section class="m-doc-details" id="ae6810f7de27e5a347331aacfce67bea1"><div>
            <h3>
              <span class="m-doc-wrap-bumper">void tf::<wbr />cudaFlow::<wbr /></span><span class="m-doc-wrap"><span class="m-doc-wrap-bumper"><a href="#ae6810f7de27e5a347331aacfce67bea1" class="m-doc-self">run</a>(</span><span class="m-doc-wrap">cudaStream_t stream)</span></span>
            </h3>
            <p>offloads the cudaFlow onto a GPU asynchronously via a stream</p>
            <table class="m-table m-fullwidth m-flat">
              <thead>
                <tr><th colspan="2">Parameters</th></tr>
              </thead>
              <tbody>
                <tr>
                  <td style="width: 1%">stream</td>
                  <td>stream for performing this operation</td>
                </tr>
              </tbody>
            </table>
<p>Offloads the present cudaFlow onto a GPU asynchronously via the given stream.</p><p>An offloaded cudaFlow forces the underlying graph to be instantiated. After the instantiation, you should not modify the graph topology but update node parameters.</p>
          </div></section>
          <section class="m-doc-details" id="ac2906cb0002fc411a983d100a3d58d62"><div>
            <h3>
              <div class="m-doc-template">
                template&lt;typename C&gt;
              </div>
              <span class="m-doc-wrap-bumper"><a href="classtf_1_1cudaTask.html" class="m-doc">cudaTask</a> tf::<wbr />cudaFlow::<wbr /></span><span class="m-doc-wrap"><span class="m-doc-wrap-bumper"><a href="#ac2906cb0002fc411a983d100a3d58d62" class="m-doc-self">single_task</a>(</span><span class="m-doc-wrap">C c)</span></span>
            </h3>
            <p>runs a callable with only a single kernel thread</p>
            <table class="m-table m-fullwidth m-flat">
              <thead>
                <tr><th colspan="2">Template parameters</th></tr>
              </thead>
              <tbody>
                <tr>
                  <td style="width: 1%">C</td>
                  <td>callable type</td>
                </tr>
              </tbody>
              <thead>
                <tr><th colspan="2">Parameters</th></tr>
              </thead>
              <tbody>
                <tr>
                  <td>c</td>
                  <td>callable to run by a single kernel thread</td>
                </tr>
              </tbody>
              <tfoot>
                <tr>
                  <th>Returns</th>
                  <td>a <a href="classtf_1_1cudaTask.html" class="m-doc">tf::<wbr />cudaTask</a> handle</td>
                </tr>
              </tfoot>
            </table>
          </div></section>
          <section class="m-doc-details" id="add2d364f38c72322d8e36bc0da0b98e4"><div>
            <h3>
              <div class="m-doc-template">
                template&lt;typename C&gt;
              </div>
              <span class="m-doc-wrap-bumper">void tf::<wbr />cudaFlow::<wbr /></span><span class="m-doc-wrap"><span class="m-doc-wrap-bumper"><a href="#add2d364f38c72322d8e36bc0da0b98e4" class="m-doc-self">single_task</a>(</span><span class="m-doc-wrap"><a href="classtf_1_1cudaTask.html" class="m-doc">cudaTask</a> task,
              C c)</span></span>
            </h3>
            <p>updates a single-threaded kernel task</p>
<p>This method is similar to <a href="classtf_1_1cudaFlow.html#ac2906cb0002fc411a983d100a3d58d62" class="m-doc">cudaFlow::<wbr />single_task</a> but operates on an existing task.</p>
          </div></section>
          <section class="m-doc-details" id="a1a681f6223853b6445dcfdad07e4d0fd"><div>
            <h3>
              <div class="m-doc-template">
                template&lt;typename I, typename C&gt;
              </div>
              <span class="m-doc-wrap-bumper"><a href="classtf_1_1cudaTask.html" class="m-doc">cudaTask</a> tf::<wbr />cudaFlow::<wbr /></span><span class="m-doc-wrap"><span class="m-doc-wrap-bumper"><a href="#a1a681f6223853b6445dcfdad07e4d0fd" class="m-doc-self">for_each</a>(</span><span class="m-doc-wrap">I first,
              I last,
              C callable)</span></span>
            </h3>
            <p>applies a callable to each dereferenced element of the data array</p>
            <table class="m-table m-fullwidth m-flat">
              <thead>
                <tr><th colspan="2">Template parameters</th></tr>
              </thead>
              <tbody>
                <tr>
                  <td style="width: 1%">I</td>
                  <td>iterator type</td>
                </tr>
                <tr>
                  <td>C</td>
                  <td>callable type</td>
                </tr>
              </tbody>
              <thead>
                <tr><th colspan="2">Parameters</th></tr>
              </thead>
              <tbody>
                <tr>
                  <td>first</td>
                  <td>iterator to the beginning (inclusive)</td>
                </tr>
                <tr>
                  <td>last</td>
                  <td>iterator to the end (exclusive)</td>
                </tr>
                <tr>
                  <td>callable</td>
                  <td>a callable object to apply to the dereferenced iterator</td>
                </tr>
              </tbody>
              <tfoot>
                <tr>
                  <th>Returns</th>
                  <td>a <a href="classtf_1_1cudaTask.html" class="m-doc">tf::<wbr />cudaTask</a> handle</td>
                </tr>
              </tfoot>
            </table>
<p>This method is equivalent to the parallel execution of the following loop on a GPU:</p><pre class="m-code"><span class="k">for</span><span class="p">(</span><span class="k">auto</span><span class="w"> </span><span class="n">itr</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">first</span><span class="p">;</span><span class="w"> </span><span class="n">itr</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="n">last</span><span class="p">;</span><span class="w"> </span><span class="n">itr</span><span class="o">++</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w">  </span><span class="n">callable</span><span class="p">(</span><span class="o">*</span><span class="n">itr</span><span class="p">);</span>
<span class="p">}</span></pre>
          </div></section>
          <section class="m-doc-details" id="af9cc7ee16602754929bb9118a9d7f0b2"><div>
            <h3>
              <div class="m-doc-template">
                template&lt;typename I, typename C&gt;
              </div>
              <span class="m-doc-wrap-bumper">void tf::<wbr />cudaFlow::<wbr /></span><span class="m-doc-wrap"><span class="m-doc-wrap-bumper"><a href="#af9cc7ee16602754929bb9118a9d7f0b2" class="m-doc-self">for_each</a>(</span><span class="m-doc-wrap"><a href="classtf_1_1cudaTask.html" class="m-doc">cudaTask</a> task,
              I first,
              I last,
              C callable)</span></span>
            </h3>
            <p>updates parameters of a kernel task created from <a href="classtf_1_1cudaFlow.html#a1a681f6223853b6445dcfdad07e4d0fd" class="m-doc">tf::<wbr />cudaFlow::<wbr />for_each</a></p>
<p>The type of the iterators and the callable must be the same as the task created from <a href="classtf_1_1cudaFlow.html#a1a681f6223853b6445dcfdad07e4d0fd" class="m-doc">tf::<wbr />cudaFlow::<wbr />for_each</a>.</p>
          </div></section>
          <section class="m-doc-details" id="a34f1ea89e5651faa6e8af522a42556ac"><div>
            <h3>
              <div class="m-doc-template">
                template&lt;typename I, typename C&gt;
              </div>
              <span class="m-doc-wrap-bumper"><a href="classtf_1_1cudaTask.html" class="m-doc">cudaTask</a> tf::<wbr />cudaFlow::<wbr /></span><span class="m-doc-wrap"><span class="m-doc-wrap-bumper"><a href="#a34f1ea89e5651faa6e8af522a42556ac" class="m-doc-self">for_each_index</a>(</span><span class="m-doc-wrap">I first,
              I last,
              I step,
              C callable)</span></span>
            </h3>
            <p>applies a callable to each index in the range with the step size</p>
            <table class="m-table m-fullwidth m-flat">
              <thead>
                <tr><th colspan="2">Template parameters</th></tr>
              </thead>
              <tbody>
                <tr>
                  <td style="width: 1%">I</td>
                  <td>index type</td>
                </tr>
                <tr>
                  <td>C</td>
                  <td>callable type</td>
                </tr>
              </tbody>
              <thead>
                <tr><th colspan="2">Parameters</th></tr>
              </thead>
              <tbody>
                <tr>
                  <td>first</td>
                  <td>beginning index</td>
                </tr>
                <tr>
                  <td>last</td>
                  <td>last index</td>
                </tr>
                <tr>
                  <td>step</td>
                  <td>step size</td>
                </tr>
                <tr>
                  <td>callable</td>
                  <td>the callable to apply to each element in the data array</td>
                </tr>
              </tbody>
              <tfoot>
                <tr>
                  <th>Returns</th>
                  <td>a <a href="classtf_1_1cudaTask.html" class="m-doc">tf::<wbr />cudaTask</a> handle</td>
                </tr>
              </tfoot>
            </table>
<p>This method is equivalent to the parallel execution of the following loop on a GPU:</p><pre class="m-code"><span class="c1">// step is positive [first, last)</span>
<span class="k">for</span><span class="p">(</span><span class="k">auto</span><span class="w"> </span><span class="n">i</span><span class="o">=</span><span class="n">first</span><span class="p">;</span><span class="w"> </span><span class="n">i</span><span class="o">&lt;</span><span class="n">last</span><span class="p">;</span><span class="w"> </span><span class="n">i</span><span class="o">+=</span><span class="n">step</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w">  </span><span class="n">callable</span><span class="p">(</span><span class="n">i</span><span class="p">);</span>
<span class="p">}</span>

<span class="c1">// step is negative [first, last)</span>
<span class="k">for</span><span class="p">(</span><span class="k">auto</span><span class="w"> </span><span class="n">i</span><span class="o">=</span><span class="n">first</span><span class="p">;</span><span class="w"> </span><span class="n">i</span><span class="o">&gt;</span><span class="n">last</span><span class="p">;</span><span class="w"> </span><span class="n">i</span><span class="o">+=</span><span class="n">step</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w">  </span><span class="n">callable</span><span class="p">(</span><span class="n">i</span><span class="p">);</span>
<span class="p">}</span></pre>
          </div></section>
          <section class="m-doc-details" id="a3fa7f8e38b4da1fe0cbcfb265f9349a2"><div>
            <h3>
              <div class="m-doc-template">
                template&lt;typename I, typename C&gt;
              </div>
              <span class="m-doc-wrap-bumper">void tf::<wbr />cudaFlow::<wbr /></span><span class="m-doc-wrap"><span class="m-doc-wrap-bumper"><a href="#a3fa7f8e38b4da1fe0cbcfb265f9349a2" class="m-doc-self">for_each_index</a>(</span><span class="m-doc-wrap"><a href="classtf_1_1cudaTask.html" class="m-doc">cudaTask</a> task,
              I first,
              I last,
              I step,
              C callable)</span></span>
            </h3>
            <p>updates parameters of a kernel task created from <a href="classtf_1_1cudaFlow.html#a34f1ea89e5651faa6e8af522a42556ac" class="m-doc">tf::<wbr />cudaFlow::<wbr />for_each_index</a></p>
<p>The type of the iterators and the callable must be the same as the task created from <a href="classtf_1_1cudaFlow.html#a34f1ea89e5651faa6e8af522a42556ac" class="m-doc">tf::<wbr />cudaFlow::<wbr />for_each_index</a>.</p>
          </div></section>
          <section class="m-doc-details" id="af89a9bda182272462a0eda2581536cd8"><div>
            <h3>
              <div class="m-doc-template">
                template&lt;typename I, typename O, typename C&gt;
              </div>
              <span class="m-doc-wrap-bumper"><a href="classtf_1_1cudaTask.html" class="m-doc">cudaTask</a> tf::<wbr />cudaFlow::<wbr /></span><span class="m-doc-wrap"><span class="m-doc-wrap-bumper"><a href="#af89a9bda182272462a0eda2581536cd8" class="m-doc-self">transform</a>(</span><span class="m-doc-wrap">I first,
              I last,
              O output,
              C op)</span></span>
            </h3>
            <p>applies a callable to a source range and stores the result in a target range</p>
            <table class="m-table m-fullwidth m-flat">
              <thead>
                <tr><th colspan="2">Template parameters</th></tr>
              </thead>
              <tbody>
                <tr>
                  <td style="width: 1%">I</td>
                  <td>input iterator type</td>
                </tr>
                <tr>
                  <td>O</td>
                  <td>output iterator type</td>
                </tr>
                <tr>
                  <td>C</td>
                  <td>unary operator type</td>
                </tr>
              </tbody>
              <thead>
                <tr><th colspan="2">Parameters</th></tr>
              </thead>
              <tbody>
                <tr>
                  <td>first</td>
                  <td>iterator to the beginning of the input range</td>
                </tr>
                <tr>
                  <td>last</td>
                  <td>iterator to the end of the input range</td>
                </tr>
                <tr>
                  <td>output</td>
                  <td>iterator to the beginning of the output range</td>
                </tr>
                <tr>
                  <td>op</td>
                  <td>the operator to apply to transform each element in the range</td>
                </tr>
              </tbody>
              <tfoot>
                <tr>
                  <th>Returns</th>
                  <td>a <a href="classtf_1_1cudaTask.html" class="m-doc">tf::<wbr />cudaTask</a> handle</td>
                </tr>
              </tfoot>
            </table>
<p>This method is equivalent to the parallel execution of the following loop on a GPU:</p><pre class="m-code"><span class="k">while</span><span class="w"> </span><span class="p">(</span><span class="n">first</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="n">last</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w">  </span><span class="o">*</span><span class="n">output</span><span class="o">++</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">callable</span><span class="p">(</span><span class="o">*</span><span class="n">first</span><span class="o">++</span><span class="p">);</span>
<span class="p">}</span></pre>
          </div></section>
          <section class="m-doc-details" id="a4a211b1f8562e10f9aae8b44fd6acdec"><div>
            <h3>
              <div class="m-doc-template">
                template&lt;typename I, typename O, typename C&gt;
              </div>
              <span class="m-doc-wrap-bumper">void tf::<wbr />cudaFlow::<wbr /></span><span class="m-doc-wrap"><span class="m-doc-wrap-bumper"><a href="#a4a211b1f8562e10f9aae8b44fd6acdec" class="m-doc-self">transform</a>(</span><span class="m-doc-wrap"><a href="classtf_1_1cudaTask.html" class="m-doc">cudaTask</a> task,
              I first,
              I last,
              O output,
              C c)</span></span>
            </h3>
            <p>updates parameters of a kernel task created from <a href="classtf_1_1cudaFlow.html#af89a9bda182272462a0eda2581536cd8" class="m-doc">tf::<wbr />cudaFlow::<wbr />transform</a></p>
<p>The type of the iterators and the callable must be the same as the task created from <a href="classtf_1_1cudaFlow.html#a1a681f6223853b6445dcfdad07e4d0fd" class="m-doc">tf::<wbr />cudaFlow::<wbr />for_each</a>.</p>
          </div></section>
          <section class="m-doc-details" id="abab2bfdfc86ef3a764ece4743fdede76"><div>
            <h3>
              <div class="m-doc-template">
                template&lt;typename I1, typename I2, typename O, typename C&gt;
              </div>
              <span class="m-doc-wrap-bumper"><a href="classtf_1_1cudaTask.html" class="m-doc">cudaTask</a> tf::<wbr />cudaFlow::<wbr /></span><span class="m-doc-wrap"><span class="m-doc-wrap-bumper"><a href="#abab2bfdfc86ef3a764ece4743fdede76" class="m-doc-self">transform</a>(</span><span class="m-doc-wrap">I1 first1,
              I1 last1,
              I2 first2,
              O output,
              C op)</span></span>
            </h3>
            <p>creates a task to perform parallel transforms over two ranges of items</p>
            <table class="m-table m-fullwidth m-flat">
              <thead>
                <tr><th colspan="2">Template parameters</th></tr>
              </thead>
              <tbody>
                <tr>
                  <td style="width: 1%">I1</td>
                  <td>first input iterator type</td>
                </tr>
                <tr>
                  <td>I2</td>
                  <td>second input iterator type</td>
                </tr>
                <tr>
                  <td>O</td>
                  <td>output iterator type</td>
                </tr>
                <tr>
                  <td>C</td>
                  <td>unary operator type</td>
                </tr>
              </tbody>
              <thead>
                <tr><th colspan="2">Parameters</th></tr>
              </thead>
              <tbody>
                <tr>
                  <td>first1</td>
                  <td>iterator to the beginning of the input range</td>
                </tr>
                <tr>
                  <td>last1</td>
                  <td>iterator to the end of the input range</td>
                </tr>
                <tr>
                  <td>first2</td>
                  <td>iterato</td>
                </tr>
                <tr>
                  <td>output</td>
                  <td>iterator to the beginning of the output range</td>
                </tr>
                <tr>
                  <td>op</td>
                  <td>binary operator to apply to transform each pair of items in the two input ranges</td>
                </tr>
              </tbody>
              <tfoot>
                <tr>
                  <th>Returns</th>
                  <td><a href="classtf_1_1cudaTask.html" class="m-doc">cudaTask</a> handle</td>
                </tr>
              </tfoot>
            </table>
<p>This method is equivalent to the parallel execution of the following loop on a GPU:</p><pre class="m-code"><span class="k">while</span><span class="w"> </span><span class="p">(</span><span class="n">first1</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="n">last1</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w">  </span><span class="o">*</span><span class="n">output</span><span class="o">++</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">op</span><span class="p">(</span><span class="o">*</span><span class="n">first1</span><span class="o">++</span><span class="p">,</span><span class="w"> </span><span class="o">*</span><span class="n">first2</span><span class="o">++</span><span class="p">);</span>
<span class="p">}</span></pre>
          </div></section>
          <section class="m-doc-details" id="a7c6ca7be2b6908e8f71570c54303ba9e"><div>
            <h3>
              <div class="m-doc-template">
                template&lt;typename I1, typename I2, typename O, typename C&gt;
              </div>
              <span class="m-doc-wrap-bumper">void tf::<wbr />cudaFlow::<wbr /></span><span class="m-doc-wrap"><span class="m-doc-wrap-bumper"><a href="#a7c6ca7be2b6908e8f71570c54303ba9e" class="m-doc-self">transform</a>(</span><span class="m-doc-wrap"><a href="classtf_1_1cudaTask.html" class="m-doc">cudaTask</a> task,
              I1 first1,
              I1 last1,
              I2 first2,
              O output,
              C c)</span></span>
            </h3>
            <p>updates parameters of a kernel task created from <a href="classtf_1_1cudaFlow.html#af89a9bda182272462a0eda2581536cd8" class="m-doc">tf::<wbr />cudaFlow::<wbr />transform</a></p>
<p>The type of the iterators and the callable must be the same as the task created from <a href="classtf_1_1cudaFlow.html#a1a681f6223853b6445dcfdad07e4d0fd" class="m-doc">tf::<wbr />cudaFlow::<wbr />for_each</a>.</p>
          </div></section>
          <section class="m-doc-details" id="a89c389fff64a16e5dd8c60875d3b514d"><div>
            <h3>
              <div class="m-doc-template">
                template&lt;typename C&gt;
              </div>
              <span class="m-doc-wrap-bumper"><a href="classtf_1_1cudaTask.html" class="m-doc">cudaTask</a> tf::<wbr />cudaFlow::<wbr /></span><span class="m-doc-wrap"><span class="m-doc-wrap-bumper"><a href="#a89c389fff64a16e5dd8c60875d3b514d" class="m-doc-self">capture</a>(</span><span class="m-doc-wrap">C&amp;&amp; callable)</span></span>
            </h3>
            <p>constructs a subflow graph through <a href="classtf_1_1cudaFlowCapturer.html" class="m-doc">tf::<wbr />cudaFlowCapturer</a></p>
            <table class="m-table m-fullwidth m-flat">
              <thead>
                <tr><th colspan="2">Template parameters</th></tr>
              </thead>
              <tbody>
                <tr>
                  <td style="width: 1%">C</td>
                  <td>callable type constructible from <code>std::function&lt;void(tf::cudaFlowCapturer&amp;)&gt;</code></td>
                </tr>
              </tbody>
              <thead>
                <tr><th colspan="2">Parameters</th></tr>
              </thead>
              <tbody>
                <tr>
                  <td>callable</td>
                  <td>the callable to construct a capture flow</td>
                </tr>
              </tbody>
              <tfoot>
                <tr>
                  <th>Returns</th>
                  <td>a <a href="classtf_1_1cudaTask.html" class="m-doc">tf::<wbr />cudaTask</a> handle</td>
                </tr>
              </tfoot>
            </table>
<p>A captured subflow forms a sub-graph to the cudaFlow and can be used to capture custom (or third-party) kernels that cannot be directly constructed from the cudaFlow.</p><p>Example usage:</p><pre class="m-code"><span class="n">taskflow</span><span class="p">.</span><span class="n">emplace</span><span class="p">([</span><span class="o">&amp;</span><span class="p">](</span><span class="n">tf</span><span class="o">::</span><span class="n">cudaFlow</span><span class="o">&amp;</span><span class="w"> </span><span class="n">cf</span><span class="p">){</span>

<span class="w">  </span><span class="n">tf</span><span class="o">::</span><span class="n">cudaTask</span><span class="w"> </span><span class="n">my_kernel</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">cf</span><span class="p">.</span><span class="n">kernel</span><span class="p">(</span><span class="n">my_arguments</span><span class="p">);</span>

<span class="w">  </span><span class="c1">// create a flow capturer to capture custom kernels</span>
<span class="w">  </span><span class="n">tf</span><span class="o">::</span><span class="n">cudaTask</span><span class="w"> </span><span class="n">my_subflow</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">cf</span><span class="p">.</span><span class="n">capture</span><span class="p">([</span><span class="o">&amp;</span><span class="p">](</span><span class="n">tf</span><span class="o">::</span><span class="n">cudaFlowCapturer</span><span class="o">&amp;</span><span class="w"> </span><span class="n">capturer</span><span class="p">){</span>
<span class="w">    </span><span class="n">capturer</span><span class="p">.</span><span class="n">on</span><span class="p">([</span><span class="o">&amp;</span><span class="p">](</span><span class="n">cudaStream_t</span><span class="w"> </span><span class="n">stream</span><span class="p">){</span>
<span class="w">      </span><span class="n">invoke_custom_kernel_with_stream</span><span class="p">(</span><span class="n">stream</span><span class="p">,</span><span class="w"> </span><span class="n">custom_arguments</span><span class="p">);</span>
<span class="w">    </span><span class="p">});</span>
<span class="w">  </span><span class="p">});</span>

<span class="w">  </span><span class="n">my_kernel</span><span class="p">.</span><span class="n">precede</span><span class="p">(</span><span class="n">my_subflow</span><span class="p">);</span>
<span class="p">});</span></pre>
          </div></section>
          <section class="m-doc-details" id="aa0f182dc0fa99bcc9118311925fddca5"><div>
            <h3>
              <div class="m-doc-template">
                template&lt;typename C&gt;
              </div>
              <span class="m-doc-wrap-bumper">void tf::<wbr />cudaFlow::<wbr /></span><span class="m-doc-wrap"><span class="m-doc-wrap-bumper"><a href="#aa0f182dc0fa99bcc9118311925fddca5" class="m-doc-self">capture</a>(</span><span class="m-doc-wrap"><a href="classtf_1_1cudaTask.html" class="m-doc">cudaTask</a> task,
              C callable)</span></span>
            </h3>
            <p>updates the captured child graph</p>
<p>The method is similar to <a href="classtf_1_1cudaFlow.html#a89c389fff64a16e5dd8c60875d3b514d" class="m-doc">tf::<wbr />cudaFlow::<wbr />capture</a> but operates on a task of type <a href="namespacetf.html#afebc56ae6d5765010d0dd13a5f04132ea46be697979903d784a70aeec45eb14ad" class="m-doc">tf::<wbr />cudaTaskType::<wbr />SUBFLOW</a>. The new captured graph must be topologically identical to the original captured graph.</p>
          </div></section>
        </section>
      </div>
    </div>
  </div>
</article></main>
<div class="m-doc-search" id="search">
  <a href="#!" onclick="return hideSearch()"></a>
  <div class="m-container">
    <div class="m-row">
      <div class="m-col-m-8 m-push-m-2">
        <div class="m-doc-search-header m-text m-small">
          <div><span class="m-label m-default">Tab</span> / <span class="m-label m-default">T</span> to search, <span class="m-label m-default">Esc</span> to close</div>
          <div id="search-symbolcount">&hellip;</div>
        </div>
        <div class="m-doc-search-content">
          <form>
            <input type="search" name="q" id="search-input" placeholder="Loading &hellip;" disabled="disabled" autofocus="autofocus" autocomplete="off" spellcheck="false" />
          </form>
          <noscript class="m-text m-danger m-text-center">Unlike everything else in the docs, the search functionality <em>requires</em> JavaScript.</noscript>
          <div id="search-help" class="m-text m-dim m-text-center">
            <p class="m-noindent">Search for symbols, directories, files, pages or
            modules. You can omit any prefix from the symbol or file path; adding a
            <code>:</code> or <code>/</code> suffix lists all members of given symbol or
            directory.</p>
            <p class="m-noindent">Use <span class="m-label m-dim">&darr;</span>
            / <span class="m-label m-dim">&uarr;</span> to navigate through the list,
            <span class="m-label m-dim">Enter</span> to go.
            <span class="m-label m-dim">Tab</span> autocompletes common prefix, you can
            copy a link to the result using <span class="m-label m-dim">⌘</span>
            <span class="m-label m-dim">L</span> while <span class="m-label m-dim">⌘</span>
            <span class="m-label m-dim">M</span> produces a Markdown link.</p>
          </div>
          <div id="search-notfound" class="m-text m-warning m-text-center">Sorry, nothing was found.</div>
          <ul id="search-results"></ul>
        </div>
      </div>
    </div>
  </div>
</div>
<script src="search-v2.js"></script>
<script src="searchdata-v2.js" async="async"></script>
<footer><nav>
  <div class="m-container">
    <div class="m-row">
      <div class="m-col-l-10 m-push-l-1">
        <p>Taskflow handbook is part of the <a href="https://taskflow.github.io">Taskflow project</a>, copyright © <a href="https://tsung-wei-huang.github.io/">Dr. Tsung-Wei Huang</a>, 2018&ndash;2024.<br />Generated by <a href="https://doxygen.org/">Doxygen</a> 1.9.6 and <a href="https://mcss.mosra.cz/">m.css</a>.</p>
      </div>
    </div>
  </div>
</nav></footer>
</body>
</html>
